Cct protein expression promoter

ABSTRACT

The present invention provides a noveal isolated and purified nucleic acid molecule having a nucleotide sequence that in nature directs transcription of CTP:phosphocholine cytidylyltrransferase (CCT), i.e., is a novel CCT promoter, and methods of use thereof.

The invention was made with the support of NIH Grant No. R01 HL 55584.The U.S. government has certain rights in the invention.

FIELD OF INVENTION

The present invention relates generally to the field of molecularbiology. More specifically, it relates to the regulation of geneexpression.

BACKGROUND OF INVENTION

Promoters and other regulatory components from bacteria, viruses, fungiand animals have been used to control gene expression in animal cells.Numerous transformation experiments using DNA constructs comprisingvarious promoter sequences fused to various foreign genes, such asbacterial marker genes, have led to the identification of usefulpromoter sequences.

The limitations of currently available eukaryotic promoters include thefact that they lack sufficient activity when used to induce expressionof proteins, are too large to be packaged into adenoviral vectors for invivo gene delivery, or succumb to promoter “turn-off” by inflammatoryfactors elaborated by the host. In addition, such promoters are oftenbased on sequences obtained from viruses.

There is, therefore, an unmet need for a eukaryotic promoter thatexhibits robust activity with several fold greater activity than thepresently known promoters.

SUMMARY OF INVENTION

The present invention provides an isolated and purified nucleic acidpromoter molecule that in nature directs transcription ofCTP:phosphocholine cytidylyltransferase (CCT), a key ubiquitousregulatory enzyme present in nucleated (eukaryotic) cells involved inphospholipid synthesis. This nucleotide sequence may be a CCTp208promoter (SEQ ID NO:1) or a mutated CCTp240 promoter (mCCTp240) (SEQ IDNO:2). SEQ ID NO:1 is the following:CCCTCTGGAAGCGGAACTACTCTGTCAGGTTGTGGTTTTCAGGAATGCGGAGGTGGCATTGACAAGAGGGCGGGCGGGAGGCGGGACTTCCGGTCCGCAGTCCGGTCAGATGTTTCCCGGGCGTCTCCCCGCAACCCATTTGACTTCGCTAGTCGGTGACGCGGCGCGGGGAAGGGATCCGAGGGGGA CCGGAGCCTGGAGGAGTTGA. SEQID NO:2 is the following:AGCGTTCGGCTCAGTAAACCCACGCGCCCGGCCCCTCTGGAAGCGGAACTACTCTGTCAGGTTGTGGTTTTCAGGAATGCGGAGGTGGCATTGACAAGAGGGCGGGCGGGAGGCGGGACTTCCGGTCCGCAGTCCGGTCAGATGTTTCCCGGGCGTCTCCCCGCAACCCATTTGACTTCGCTAGTCGGTGACGCGGCGCGGGGAAGGGATCCGAGGGGGACCGGAGCCTGGAGGA GTTGA. The mutatedCCTp240 is identical to the native CCTp240 sequence except that theputative sterol regulatory element region within the 5′ flanking region(−156/−147) bp relative to the first transcriptional start site of theCCT gene has been mutated.

CCT promoters of the present invention confer high levels of expressionof operably linked preselected DNA segments in eukaryotic cells, inparticular in an animal cell, such as a mammalian cell. The nucleotidesequence of a promoter of the present invention may substantiallyidentical to a polynucleotide encoded by SEQ ID NO:1 or SEQ ID NO:2 or afragment (portion) of SEQ ID NO:1 or SEQ ID NO:2, and is biologicallyactive. In other words, the present invention includes promoters thatare substantially similar to SEQ ID NO:1 or SEQ ID NO:2. Anotherembodiment of the invention is a CCTp208 or mCCTp240 promoter thatcomprises the minimum number of contiguous nucleotides that initiate RNAtranscription.

CCT promoters of the present invention exhibit at least about a 2-foldincrease in expression above a PGL3Pro promoter. The promoter mayexhibit at least about a 10-fold, 20-fold, or even 30-fold increase inexpression about a PGL3Pro promoter.

Also provided are compositions, expression cassettes (e.g., recombinantvectors), and host cells, comprising the nucleic acid molecule thatcomprises a nucleic acid segment that encodes a CCTp208 or mCCTp240. Inone embodiment, the invention provides an expression cassette or vectorcontaining an isolated nucleic acid molecule having a CCTp208 ormCCTp240 promoter nucleotide sequence that directs transcription of alinked nucleic acid segment in a cell, which nucleotide sequence isoptionally operably linked to other suitable regulatory sequences, e.g.,a transcription terminator sequence, operator, repressor binding site,transcription factor binding site and/or an enhancer. This expressioncassette or vector may be contained in a host cell. The expressioncassette or vector may augment the genome of a host cell or may bemaintained extrachromosomally. The expression cassette may beoperatively linked to a structural gene, an open reading frame thereof,or a portion thereof.

The present invention further provides a method of augmenting aeukaryotic genome by contacting a eukaryotic cell with a nucleic acidmolecule of the invention, e.g., one having a CCTp208 or mCCTp240promoter nucleotide sequence that directs transcription of a linkednucleic acid segment so as to yield a transformed eukaryotic cell. Thenucleic acid molecule may be present inside or outside the nucleus, suchas in the mitochondria of the cell.

The present invention also provides host cells made by the methoddescribed above.

The present invention further provides host cells containing the aCCTp208 or mCCTp240 promoter described above, or the expressioncassettes described above.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows the effect of LPDS on CCT mRNA in MLE cells. (A) CCT mRNAor (13)18S RNA was determined by Northern analysis. (C) CCT mRNA levelsin FBS and LPDS exposed cells as determined by Northern analysis usingpoly(A)-rich RNA. (D) Densitometric analysis of CCT mRNA in cells.

FIG. 2 shows the effect of LPDS on turnover of CCT mRNA in MLE cells.Northern analysis of total cellular RNA (A) CCT mRNA and (B) 18S RNA.(C) Densitometric analysis of autoradiograms shows CCT mRNA/18S ODratios.

FIG. 3 shows the effect of LPDS on CCT gene transcription in MLE cells.

FIG. 4 shows deletional analysis of the CCT promoter in MLE cells.

FIG. 5 shows that LPDS activates the CCT promoter in MLE cells.

FIG. 6 shows that LPDS activation of the CCT promoter is not speciesspecific.

FIG. 7 shows that LPDS activation of the CCT promoter requires intactSRE.

FIG. 8 shows the promoter activity of CCT.

FIG. 9 shows that LPDS activates the CCT promoter in MLE cells.

DETAILED DESCRIPTION OF THE INVENTION

Phosphatidylcholine (PC) is the major structural phospholipid in alleukaryotic cells, including mammalian membranes, and plays a significantrole in the production of intracellular messenger molecules. Thebiosynthesis of PC occurs mainly through the CDP-choline pathway, but inthe liver PC is also made by the methylation ofphosphatidylethanolamine. The rate of PC biosynthesis is governedtightly in most instances by the rate of conversion of phosphocholine toCDP-choline, a very slow (rate-limiting) reaction catalyzed byCTP:phosphocholine cytidylyltransferase (CCT). CCT activity is regulatedby the reversible translocation between the membranes where it is activeand a soluble form that has low activity. Enzyme-membrane interactionsplay an important role in the activation, and translocation to themembranes can be achieved either by dephosphorylation or by the presenceof lipids. Protein phosphorylation/dephosphorlation may also be animportant regulatory mechanism for the modulation of CCT activity andtranslocation, but a definitive demonstration of the specific kinasesinvolved in vivo remains to be elucidated CCT purified from rat liver isa homodimer with a subunit molecular mass of 42 kDa. cDNAs for CCT havebeen cloned from yeast (Tsukagoshi et al., Eur. J. Biochem. (1987)169:477-486), rat (Kalmar et al., Proc. Nat'l Acad. Sci. USA (1990)87:6029-6033), CHO cells (Sweitzer et al., Arch. Biochem. Biophys.(1994) 311:207-226), HeLa cells (Kalmar et al., Biochim. Biophys. Acta(1994) 1219:328-334), mouse (Rutherford et al., Genomics (1993)18:698-701), Arabidopsis thaliana (Choi et al., Mol. Cells (1997)7:58-63), Plasmodium falciparum (Yeo et al., Eur. J. Biochem. (1995)233:62-72), and Brassica napus (Hashida et al., Plant Mol. Biol. (1996)316: 205-211). In humans two isoforms of CCT, CCT-alpha and CCT-beta,have been identified (Lykidis et al., J. Biol. Chem. (1998)273:4022-14029).

A genomic clone containing the last two exons and the 3′-noncodingsequence of the murine CCT gene (Ctpct) was cloned (Rutherford et al.,Genomics (1993)18:698-701). Ctpct was subsequently identified as asingle copy gene located on mouse chromosome 16. Tang et al. isolatedand characterized the entire coding region of the Ctpct gene (Tang etal., J. Biol. Chem. (1997) 272:13146-13151). Bakovic et al. reported thestructural and functional organization of the 5′-flanking sequence ofthe Ctpct gene. They delineated several promoter regions involved in theactivation and repression of transcription (Bakovic et al., Biochem.Biophys. Acta (1999) 1438:147-165). In particular, they studied thefollowing regions of the mouse Ctpct promoter: −2068/−418, −2068/+38,−1268/+38, −210/+38, −130/+38, −90/+38, −52/+38, −10/+38, and +15/+38.

The present invention describes novel DNA sequences that encode aubiquitous mammalian enzyme, CCT. The specific gene sequences discoveredby the inventors were termed the CCTp208 and mCCTp240 promoters. Thesepromoters significantly stimulate the expression of genes in a varietyof cell types. The CCTp208 and mCCTp240 promoters exhibit robustactivity with several fold greater activity, when compared to someexisting viral promoters (e.g., the strong viral SV40 promoter). Therelatively small size of these mammalian sequences allows for insertioninto gene transfer vectors. These observations indicate that these CCTpromoters may be uniquely significant making them useful in molecularbiology (e.g., reporter gene expression) and protein engineering. Theapplications of these promoters include the generation of recombinantproteins, reporter gene expression, and incorporation of vectors used ingene delivery and immunization protocols.

These novel promoter sequences exhibit high levels of activity containedwithin a relatively small DNA region. These properties have threepotential advantages compared to currently available promoters: (1) easypackaging into adenoviral vectors used to deliver genes exogenously, (2)ability to retain enhanced expression of desired genes and (3) since theCCT promoter is not a viral-based sequence, there is a potential tocircumvent effects of inflammatory host response that turn off activityof commonly available promoters.

These gene sequences are potent and yet their activity is confined to arelatively small sequence of DNA and thus can be linked to a genetransfer vector (e.g., an adenovirus) to drive the expression of othergenes used to express proteins in a stable fashion. These CCT promotersexhibit a 31-fold increase in expression above a comparable, currentlyavailable commercial promoter, PGL3Pro, when tested using lung cells(Promega, Inc.)

Definitions

“Promoter” refers to a nucleotide sequence, usually upstream (5′) to itscoding sequence, that controls the expression of the coding sequence byproviding the recognition for RNA polymerase and other factors requiredfor proper transcription. “Promoter” includes a minimal promoter that isa short DNA sequence comprised of a TATA-box and other sequences thatserve to specify the site of transcription initiation, to whichregulatory elements are added for control of expression. “Promoter” alsorefers to a nucleotide sequence that includes a minimal promoter plusregulatory elements that is capable of controlling the expression of acoding sequence or functional RNA. This type of promoter sequenceconsists of proximal and more distal upstream elements, the latterelements often referred to as enhancers. Accordingly, an “enhancer” is aDNA sequence that can stimulate promoter activity and may be an innateelement of the promoter or a heterologous element inserted to enhancethe level or tissue specificity of a promoter. It is capable ofoperating in both orientations (normal or flipped), and is capable offunctioning even when moved either upstream or downstream from thepromoter. Both enhancers and other upstream promoter elements bindsequence-specific DNA-binding proteins that mediate their effects.Promoters may be derived in their entirety from a native gene, or becomposed of different elements derived from different promoters found innature, or even be comprised of synthetic DNA segments. A promoter mayalso contain DNA sequences that are involved in the binding of proteinfactors that control the effectiveness of transcription initiation inresponse to physiological or developmental conditions.

The “initiation site” is the position surrounding the first nucleotidethat is part of the transcribed sequence, which is also defined asposition +1. With respect to this site all other sequences of the geneand its controlling regions are numbered. Downstream sequences (i.e.,further protein encoding sequences in the 3′ direction) are denominatedpositive, while upstream sequences (mostly of the controlling regions inthe 5′ direction) are denominated negative. Some mammalian promoters,such as the CCT promoter, have two transcription initiation sites.

Promoter elements, particularly a TATA element, that are inactive orthat have greatly reduced promoter activity in the absence of upstreamactivation are referred to as “minimal” or “core” promoters. In thepresence of a suitable transcription factor, the minimal promoterfunctions to permit transcription. A minimal or core promoter thusconsists only of all basal elements needed for transcription initiation,e.g., a TATA box and/or an initiator.

As used herein, the term “CCT promoter” means a nucleotide sequence,which when operably linked to a preselected DNA segment that encodes aprotein, RNA transcript, or mixture thereof, results in the expressionof the linked preselected DNA segment. As used herein, the term CCTpromoter includes the full-length CCTp208 and mCCTp240 promoter andbiologically active subunits of these promoters. The term CCT promoteralso includes sequences that are substantially identical to that of SEQID NO:1 and SEQ ID NO:2. Sequence comparisons maybe carried out using aSmith-Waterman sequence alignment algorithm (see e.g., Waterman (1995)or http://www hto.usc.edu/software/seqaln/index.html). The localSprogram, version 1.16, can be preferably used with following parameters:match: 1, mismatch penalty: 0.33, open-gap penalty: 2, extended-gappenalty: 2.

A “portion,” “fragment” or “subunit” of the promoters of the presentinvention is a sequence having at least about 50-70 nucleotides, atleast about 100 nucleotides, at least about 150 nucleotides, at leastabout 208 nucleotides, or at least about 240 nucleotides, so as toconfer promoter activity at a biologically active level.

As used herein, “biologically active” means that the promoter has atleast about 0.1%, 10%, 25%, 50%, 75%, 80%, 85%, even 90% or more, e.g.91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% of the activity ofthe CCT promoter comprising SEQ ID NO:1 or SEQ ID NO:2. The activity ofa promoter can be determined by methods well known to the art. Forexample, see Sambrook et al., Molecular Cloning: A Laboratory Manual(1989). Promoters of the present invention that are not identical to SEQID NO:1 or SEQ ID NO:2, but retain comparable biological activity, arecalled variant promoters. The nucleotide sequences of the inventioninclude both naturally occurring sequences as well as recombinant forms.

The term “nucleic acid” refers to deoxyribonucleotides orribonucleotides and polymers thereof in either single- ordouble-stranded form, composed of monomers (nucleotides) containing asugar, phosphate and a base that is either a purine or pyrimidine.Unless specifically limited, the term encompasses nucleic acidscontaining known analogs of natural nucleotides that have similarbinding properties as the reference nucleic acid and are metabolized ina manner similar to naturally occurring nucleotides. A “nucleic acidfragment” is a fraction of a given nucleic acid molecule. The term“nucleotide sequence” refers to a polymer of DNA or RNA that can besingle- or double-stranded, optionally containing synthetic, non-naturalor altered nucleotide bases capable of incorporation into DNA or RNApolymers. The terms “nucleic acid”, “nucleic acid molecule”, “nucleicacid fragment”, “nucleic acid sequence or segment”, or “polynucleotide”may also be used interchangeably with gene, cDNA, DNA and RNA encoded bya gene.

The invention encompasses isolated or substantially purified nucleicacid compositions. In the context of the present invention, an“isolated” or “purified” DNA molecule is a nucleic acid molecule thatexists apart from its native environment and is therefore not a productof nature. An isolated nucleic acid molecule may exist in a purifiedform or may exist in a non-native environment such as, for example, atransgenic host cell. For example, an “isolated” or “purified” nucleicacid molecule, or biologically active portion thereof, is substantiallyfree of other cellular material, or culture medium when produced byrecombinant techniques, or substantially free of chemical precursors orother chemicals when chemically synthesized. In one embodiment, an“isolated” nucleic acid is free of sequences that naturally flank thenucleic acid (ie., sequences located at the 5′ and 3′ ends of thenucleic acid) in the genomic DNA of the organism from which the nucleicacid is derived. For example, in various embodiments, the isolatednucleic acid molecule can contain less than about 70 bp, 50 bp, 10 bp,or even 5 bp, 4 bp, 3 bp, 2 bp or lbp of nucleotide sequences thatnaturally flank the nucleic acid molecule in genomic DNA of the cellfrom which the nucleic acid is derived. By “fragment” or “portion” ismeant a full length or less than full length of the nucleotide sequence.

The term “gene” is used broadly to refer to any segment of nucleic acidassociated with a biological function. Thus, genes include codingsequences and/or the regulatory sequences required for their expression.For example, gene refers to a nucleic acid fragment that expresses mRNA,functional RNA, or specific protein, including regulatory sequences.Genes also include nonexpressed DNA segments that, for example, formrecognition sequences for other proteins. Genes can be obtained from avariety of sources, including cloning from a source of interest orsynthesizing from known or predicted sequence information, and mayinclude sequences designed to have desired parameters. A “transgene”refers to a gene that has been introduced into the genome bytransformation and is stably maintained. Transgenes may include, forexample, DNA that is either heterologous or homologous to the DNA of aparticular cell to be transformed. Additionally, transgenes may comprisenative genes inserted into a non-native organism, or chimeric genes. Theterm “endogenous gene” refers to a native gene in its natural locationin the genome of an organism. A “foreign” gene refers to a gene notnormally found in the host organism but that is introduced by genetransfer.

“Naturally occurring” is used to describe an object that can be found innature as distinct from being artificially produced. For example, anucleotide sequence present in an organism (including a virus) that canbe isolated from a source in nature and has not been intentionallymodified by man in the laboratory, is “naturally occurring.”

The term “chimeric” refers to any gene or DNA that contains 1) DNAsequences, including regulatory and coding sequences, that are not foundtogether in nature, or 2) sequences encoding parts of proteins notnaturally adjoined, or 3) parts of promoters that are not naturallyadjoined. Accordingly, a chimeric gene may comprise regulatory sequencesand coding sequences that are derived from different sources, orcomprise regulatory sequences and coding sequences derived from the samesource, but arranged in a manner different from that found in nature.

“Recombinant DNA molecule” is a combination of DNA sequences that arejoined together using recombinant DNA technology and procedures used tojoin together DNA sequences as described, for example, in Sambrook etal., Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press(1989).

A “vector” is defined to include, inter alia, any plasmid, cosmid, phageor binary vector in double or single stranded linear or circular formthat may or may not be self transmissible or mobilizable, and cantransform prokaryotic or eukaryotic host either by integration into thecellular genome or exist extrachromosomally (e.g., autonomousreplicating plasmid with an origin of replication).

“Expression cassette” as used herein means a DNA sequence capable ofdirecting expression of a particular nucleotide sequence in anappropriate host cell, comprising a promoter operably linked to thenucleotide sequence of interest that is operably linked to terminationsignals. It also typically comprises sequences required for propertranslation of the nucleotide sequence. The coding region usually codesfor a protein of interest but may also code for a functional RNA ofinterest, for example antisense RNA or a nontranslated RNA, in the senseor antisense direction. The expression cassette comprising thenucleotide sequence of interest may be chimeric, meaning that at leastone of its components is heterologous with respect to at least one ofits other components. The expression cassette may also be one that isnaturally occurring but has been obtained in a recombinant form usefulfor heterologous expression. Such expression cassettes will comprise thetranscriptional initiation region linked to a nucleotide sequence ofinterest. Such an expression cassette may be provided with a pluralityof restriction sites for insertion of the gene of interest to be underthe transcriptional regulation of the regulatory regions. The expressioncassette may additionally contain selectable marker genes.

“Coding sequence” refers to a DNA or RNA sequence that codes for aspecific amino acid sequence and excludes the non-coding sequences. Itmay constitute an “uninterrupted coding sequence”, i.e., lacking anintron, such as in a cDNA or it may include one or more introns boundedby appropriate splice junctions. An “intron” is a sequence of RNA thatis contained in the primary transcript but is removed through cleavageand re-ligation of the RNA within the cell to create the mature mRNAthat can be translated into a protein.

The terms “open reading frame” and “ORF” refer to the amino acidsequence encoded between translation initiation and termination codonsof a coding sequence. The terms “initiation codon” and “terminationcodon” refer to a unit of three adjacent nucleotides (‘codon’) in acoding sequence that specifies initiation and chain termination,respectively, of protein synthesis (mRNA translation).

“Regulatory sequences” and “suitable regulatory sequences” each refer tonucleotide sequences located upstream (5′ non-coding sequences), within,or downstream (3′non-coding sequences) of a coding sequence, andinfluence the transcription, RNA processing or stability, or translationof the associated coding sequence. Regulatory sequences includeenhancers, promoters, translation leader sequences, introns, andpolyadenylation signal sequences. They include natural and syntheticsequences as well as sequences that may be a combination of syntheticand natural sequences. As is noted above, the term “suitable regulatorysequences” is not limited to promoters.

“5′ non-coding sequence” refers to a nucleotide sequence located 5′(upstream) to the coding sequence. It is present in the fully processedmRNA upstream of the initiation codon and may affect processing of theprimary transcript to mRNA, mRNA stability or translation efficiency(Turner et al., Mol. Biotech., 3:225 (1995).

“3′ non-coding sequence” refers to nucleotide sequences located 3′(downstream) to a coding sequence and include polyadenylation signalsequences and other sequences encoding regulatory signals capable ofaffecting mRNA processing or gene expression. The polyadenylation signalis usually characterized by affecting the addition of polyadenylic acidtracts to the 3′ end of the mRNA precursor.

The term “translation leader sequence” refers to that DNA sequenceportion of a gene between the promoter and coding sequence that istranscribed into RNA and is present in the fully processed mRNA upstream(5′) of the translation start codon. The translation leader sequence mayaffect processing of the primary transcript to mRNA, mRNA stability ortranslation efficiency.

“Operably-linked” refers to the association of nucleic acid sequences onsingle nucleic acid fragment so that the function of one is affected bythe other. For example, a regulatory DNA sequence is said to be“operably linked to” or “associated with” a DNA sequence that codes foran RNA or a polypeptide if the two sequences are situated such that theregulatory DNA sequence affects expression of the coding DNA sequence(i.e., that the coding sequence or functional RNA is under thetranscriptional control of the promoter). Coding sequences can beoperably-linked to regulatory sequences in sense or antisenseorientation.

“Expression” refers to the transcription and/or translation of anendogenous gene or a transgene in cells. For example, in the case ofantisense constructs, expression may refer to the transcription of theantisense DNA only. In addition, expression refers to the transcriptionand stable accumulation of sense (mRNA) or functional RNA. Expressionmay also refer to the production of protein.

The terms “cis-acting sequence” and “cis-acting element” refer to DNA orRNA sequences whose functions require them to be on the same molecule.

The terms “trans-acting sequence” and “trans-acting element” refer toDNA or RNA sequences whose function does not require them to be on thesame molecule.

The following terms are used to describe the sequence relationshipsbetween two or more nucleic acids: (a) “reference sequence”, (b)“comparison window”, (c) “sequence identity”, (d) “percentage ofsequence identity”, and (e) “substantial identity”.

(a) As used herein, “reference sequence” is a defined sequence used as abasis for sequence comparison. A reference sequence may be a subset orthe entirety of a specified sequence; for example, as a segment of afull length cDNA or gene sequence, or the complete cDNA or genesequence.

(b) As used herein, “comparison window” makes reference to a contiguousand specified segment of a polynucleotide sequence, wherein thepolynucleotide sequence in the comparison window may comprise additionsor deletions (ie., gaps) compared to the reference sequence (which doesnot comprise additions or deletions) for optimal alignment of the twosequences. Generally, the comparison window is at least 20 contiguousnucleotides in length, and optionally can be 30, 40, 50, 100, or longer.Those of skill in the art understand that to avoid a high similarity toa reference sequence due to inclusion of gaps in the polynucleotidesequence a gap penalty is typically introduced and is subtracted fromthe number of matches.

Methods of alignment of sequences for comparison are well known in theart. Thus, the determination of percent identity between any twosequences can be accomplished using a mathematical algorithm. Preferred,non-limiting examples of such mathematical algorithms are the algorithmof Myers and Miller, CABIOS, 4:11 (1988); the local homology algorithmof Smith et al., Adv. Appl. Math., 2:482 (1981); the homology alignmentalgorithm of Needleman and Wunsch, JMB, 48:443 (1970); thesearch-for-similarity-method of Pearson and Lipman, Proc. Natl. Acad.Sci. USA, 85:2444 (1988); the algorithm of Karlin and Altschul, Proc.Natl. Acad. Sci. USA, 87:2264 (1990), modified as in Karlin andAltschul, Proc. Natl. Acad. Sci. USA, 90:5873 (1993).

Computer implementations of these mathematical algorithms can beutilized for comparison of sequences to determine sequence identity.Such implementations include, but are not limited to: CLUSTAL in thePC/Gene program (available from Intelligenetics, Mountain View, Calif.);the ALIGN program (Version 2.0) and GAP, BESTFIT, BLAST, FASTA, andTFASTA in the Wisconsin Genetics Software Package, Version 8 (availablefrom Genetics Computer Group (GCG), 575 Science Drive, Madison, Wis.,USA). Alignments using these programs can be performed using the defaultparameters. The CLUSTAL program is well described by Higgins et al,Gene, 73:237 (1988); Higgins et al, CABIOS, 5:151 (1989); Corpet et al,Nucl. Acids Res., 16:10881 (1988); Huangetal., CABIOS, 8:155 (1992); andPearson et al, Meth. Mol Biol, 24:307 (1994). The ALIGN program is basedon the algorithm of Myers and Miller, supra. The BLAST programs ofAltschul et al, JMB, 215:403 (1990); Nucl. Acids Res., 25:3389 (1990),are based on the algorithm of Karlin and Altschul supra.

Software for performing BLAST analyses is publicly available through theNational Center for Biotechnology Information(http://www.ncbi.nlm.nih.gov/). This algorithm involves firstidentifying high scoring sequence pairs (HSPs) by identifying shortwords of length W in the query sequence, which either match or satisfysome positive-valued threshold score T when aligned with a word of thesame length in a database sequence. T is referred to as the neighborhoodword score threshold. These initial neighborhood word hits act as seedsfor initiating searches to find longer HSPs containing them. The wordhits are then extended in both directions along each sequence for as faras the cumulative alignment score can be increased. Cumulative scoresare calculated using, for nucleotide sequences, the parameters M (rewardscore for a pair of matching residues; always >0) and N (penalty scorefor mismatching residues; always <0). For amino acid sequences, ascoring matrix is used to calculate the cumulative score. Extension ofthe word hits in each direction are halted when the cumulative alignmentscore falls off by the quantity X from its maximum achieved value, thecumulative score goes to zero or below due to the accumulation of one ormore negative-scoring residue alignments, or the end of either sequenceis reached.

In addition to calculating percent sequence identity, the BLASTalgorithm also performs a statistical analysis of the similarity betweentwo sequences. One measure of similarity provided by the BLAST algorithmis the smallest sum probability (P(N)), which provides an indication ofthe probability by which a match between two nucleotide or amino acidsequences would occur by chance. For example, a test nucleic acidsequence is considered similar to a reference sequence if the smallestsum probability in a comparison of the test nucleic acid sequence to thereference nucleic acid sequence is less than about 0.1, more preferablyless than about 0.01, and most preferably less than about 0.001.

To obtain gapped alignments for comparison purposes, Gapped BLAST (inBLAST 2.0) can be utilized as described in Altschul et al., NucleicAcids Res., 25:3389 (1997). Alternatively, PSI-BLAST (in BLAST 2.0) canbe used to perform an iterated search that detects distant relationshipsbetween molecules. See Altschul et al., supra. When utilizing BLAST,Gapped BLAST, PSI-BLAST, the default parameters of the respectiveprograms (e.g., BLASTN for nucleotide sequences, BLASTX for proteins)can be used. The BLASTN program (for nucleotide sequences) uses asdefaults a wordlength (W) of 11, an expectation (E) of 10, a cutoff of100, M=5, N=−4, and a comparison of both strands. For amino acidsequences, the BLASTP program uses as defaults a wordlength (W) of 3, anexpectation (E) of 10, and the BLOSUM62 scoring matrix. Seehttp://www.ncbi.nlm.nih.gov. Alignment may also be performed manually byinspection.

For purposes of the present invention, comparison of nucleotidesequences for determination of percent sequence identity to the promotersequences disclosed herein is preferably made using the BlastN program(version 1.4.7 or later) with its default parameters or any equivalentprogram. By “equivalent program” is intended any sequence comparisonprogram that, for any two sequences in question, generates an alignmenthaving identical nucleotide or amino acid residue matches and anidentical percent sequence identity when compared to the correspondingalignment generated by the preferred program.

(c) As used herein, “sequence identity” or “identity” in the context oftwo nucleic acid sequences makes reference to a specified percentage ofresidues in the two sequences that are the same when aligned for maximumcorrespondence over a specified comparison window, as measured bysequence comparison algorithms or by visual inspection.

(d) As used herein, “percentage of sequence identity” means the valuedetermined by comparing two optimally aligned sequences over acomparison window, wherein the portion of the polynucleotide sequence inthe comparison window may comprise additions or deletions (i.e., gaps)as compared to the reference sequence (which does not comprise additionsor deletions) for optimal alignment of the two sequences. The percentageis calculated by determining the number of positions at which theidentical nucleic acid base occurs in both sequences to yield the numberof matched positions, dividing the number of matched positions by thetotal number of positions in the window of comparison, and multiplyingthe result by 100 to yield the percentage of sequence identity.

(e) The terms “substantial identity” or “substantially identical” whenreferring to polynucleotide sequences means that a polynucleotidecomprises a sequence that has at least 70%, 71%, 72%, 73%, 74%, 75%,76%, 77%, 78%, or 79%, preferably at least 80%, 81%, 82%, 83%, 84%, 85%,86%, 87%, 88%, or 89%, more preferably at least 90%, 91%, 92%, 93%, or94%, and most preferably at least 95% o, 96%, 97%, 98%, or 99% sequenceidentity, compared to a reference sequence using one of the alignmentprograms described using standard parameters.

Another indication that nucleotide sequences are substantially identicalis if two molecules hybridize to each other under stringent conditions.Generally, stringent conditions are selected to be about 5° C. lowerthan the thermal melting point (T_(m)) for the specific sequence at adefined ionic strength and pH. However, stringent conditions encompasstemperatures in the range of about 1° C. to about 20° C., depending uponthe desired degree of stringency as otherwise qualified herein. Nucleicacids that do not hybridize to each other under stringent conditions arestill substantially identical if the polypeptides they encode aresubstantially identical. This may occur, e.g., when a copy of a nucleicacid is created using the maximum codon degeneracy permitted by thegenetic code. One indication that two nucleic acid sequences aresubstantially identical is when the polypeptide encoded by the firstnucleic acid is immunologically cross reactive with the polypeptideencoded by the second nucleic acid.

For sequence comparison, typically one sequence acts as a referencesequence to which test sequences are compared. When using a sequencecomparison algorithm, test and reference sequences are input into acomputer, subsequence coordinates are designated if necessary, andsequence algorithm program parameters are designated. The sequencecomparison algorithm then calculates the percent sequence identity forthe test sequence(s) relative to the reference sequence, based on thedesignated program parameters.

As noted above, another indication that two nucleic acid sequences aresubstantially identical is that the two molecules hybridize to eachother under stringent conditions. The phrase “hybridizing specificallyto” refers to the binding, duplexing, or hybridizing of a molecule onlyto a particular nucleotide sequence under stringent conditions when thatsequence is present in a complex mixture (e.g., total cellular) DNA orRNA. “Bind(s) substantially” refers to complementary hybridizationbetween a probe nucleic acid and a target nucleic acid and embracesminor mismatches that can be accommodated by reducing the stringency ofthe hybridization media to achieve the desired detection of the targetnucleic acid sequence.

“Stringent hybridization conditions” and “stringent hybridization washconditions” in the context of nucleic acid hybridization experimentssuch as Southern and Northern hybridizations are sequence dependent, andare different under different environmental parameters. Longer sequenceshybridize specifically at higher temperatures. The thermal melting point(T_(m)) is the temperature (under defined ionic strength and pH) atwhich 50% of the target sequence hybridizes to a perfectly matchedprobe. Specificity is typically the function of post-hybridizationwashes, the critical factors being the ionic strength and temperature ofthe final wash solution. For DNA-DNA hybrids, the T_(m) can beapproximated from the equation of Meinkoth and Wahl, Anal. Biochem.,138:267 (1984); T_(m) 81.5° C.+16.6 (log M)+0.41 (% GC)−0.61 (%form)−500 μL; where M is the molarity of monovalent cations, % GC is thepercentage of guanosine and cytosine nucleotides in the DNA, % form isthe percentage of formamide in the hybridization solution, and L is thelength of the hybrid in base pairs. T_(m) is reduced by about 1° C. foreach 1% of mismatching; thus, T_(m), hybridization, and/or washconditions can be adjusted to hybridize to sequences of the desiredidentity. For example, if sequences with >90% identity are sought, theT_(m) can be decreased 1° C. Generally, stringent conditions areselected to be about 5° C. lower than the T_(m) for the specificsequence and its complement at a defined ionic strength and pH. However,severely stringent conditions can utilize a hybridization and/or wash at1, 2, 3, or 4° C. lower than the T_(m); moderately stringent conditionscan utilize a hybridization and/or wash at 6, 7, 8, 9, or 10° C. lowerthan the T_(m); low stringency conditions can utilize a hybridizationand/or wash at 11, 12, 13, 14, 15, or 20° C. lower than the T_(m). Usingthe equation, hybridization and wash compositions, and desiredtemperature, those of ordinary skill will understand that variations inthe stringency of hybridization and/or wash solutions are inherentlydescribed. If the desired degree of mismatching results in a temperatureof less than 45° C. (aqueous solution) or 32° C. (formamide solution),it is preferred to increase the SSC concentration so that a highertemperature can be used. An extensive guide to the hybridization ofnucleic acids is found in Tijssen, Laboratory Techniques in Biochemistryand Molecular Biology Hybridization with Nucleic Acid Probes, part Ichapter 2 “Overview of principles of hybridization and the strategy ofnucleic acid probe assays” Elsevier, N.Y. (1993). Generally, highlystringent hybridization and wash conditions are selected to be about 5°C. lower than the T_(m) for the specific sequence at a defined ionicstrength and pH.

An example of highly stringent wash conditions is 0.15 M NaCl at 72° C.for about 15 minutes. An example of stringent wash conditions is a0.2×SSC wash at 65° C. for 15 minutes (see, Sambrook, infra, for adescription of SSC buffer). Often, a high stringency wash is preceded bya low stringency wash to remove background probe signal. An examplemedium stringency wash for a duplex of, e.g., more than 100 nucleotides,is 1×SSC at 45° C. for 15 minutes. An example low stringency wash for aduplex of, e.g., more than 100 nucleotides, is 4-6×SSC at 40° C. for 15minutes. For short probes (e.g. about 10 to 50 nucleotides), stringentconditions typically involve salt concentrations of less than about 1.5M, more preferably about 0.01 to 1.0 M, Na ion concentration (or othersalts) at pH 7.0 to 8.3, and the temperature is typically at least about30° C. and at least about 60° C. for long probes (e.g., >50nucleotides). Stringent conditions may also be achieved with theaddition of destabilizing agents such as formamide. In general, a signalto noise ratio of 2× (or higher) than that observed for an unrelatedprobe in the particular hybridization assay indicates detection of aspecific hybridization. Nucleic acids that do not hybridize to eachother under stringent conditions are still substantially identical ifthe proteins that they encode are substantially identical. This occurs,e.g., when a copy of a nucleic acid is created using the maximum codondegeneracy permitted by the genetic code.

Very stringent conditions are selected to be equal to the T_(m) for aparticular probe. An example of stringent conditions for hybridizationof complementary nucleic acids that have more than 100 complementaryresidues on a filter in a Southern or Northern blot is 50% formamide,e.g., hybridization in 50% formamide, 1 M NaCl, 1% SDS at 37° C., and awash in 0.1×SSC at 60 to 65° C. Exemplary low stringency conditionsinclude hybridization with a buffer solution of 30 to 35% formamide, 1MNaCl, 1% SDS (sodium dodecyl sulphate) at 37° C., and a wash in 1× to2×SSC (20×SSC=3.0 M NaCl/0.3 M trisodium citrate) at 50 to 55° C.Exemplary moderate stringency conditions include hybridization in 40 to45% formamide, 1.0 M NaCl, 1% SDS at 37° C., and a wash in 0.5× to 1×SSCat 55 to 60° C.

“Transformed,” “transgenic,” and “recombinant” refer to a host cell ororganism into which a heterologous nucleic acid molecule has beenintroduced. The nucleic acid molecule can be stably integrated into thegenome generally known in the art and are disclosed in Sambrook et al.,Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring HarborLaboratory Press, Plainview, N.Y.) (1989). See also Inis et al., PCRProtocols, Academic Press (1995); and Gelfand, PCR Strategies, AcademicPress (1995); and Innis and Gelfand, PCR Methods Manual, Academic Press(1999). Known methods of PCR include, but are not limited to, methodsusing paired primers, nested primers, single specific primers,degenerate primers, gene-specific primers, vector-specific primers,partially mismatched primers, and the like. For example, “transformed,”“transformant,” and “transgenic” cells have been through thetransformation process and contain a foreign gene integrated into theirchromosome. The term “untransformed” refers to normal cells that havenot been through the transformation process.

“Significant increase” is an increase that is larger than the margin oferror inherent in the measurement technique, preferably an increase byabout 2-fold or greater.

I. Recipient Cells

The present invention employs recipient eukaryotic cells that aresusceptible to transformation. Such cells can be of plant or animalorigin, such as mammalian cells. For example, studies testing theCCTp208 promoter utilized transformed murine lung epithelial cell lines(MLE-12), human adenocarcinoma lung cell lines (A549), a hepatoma cellline (HepG2), and immortalized fetal rat lung alveolar epithelial celllines (IFT2).

Nutrients are provided to the cell cultures in the form of media and theenvironmental conditions for the cultures are controlled. Media andenvironmental conditions that support the growth of cell cultures arewell known to the art.

II. DNA Sequences

Virtually any DNA composition may be used for delivery to recipientcells in accordance with the present invention. The DNA segment or genechosen for cellular introduction will often encode a protein and can beexpressed in the resultant transformed cells. The DNA segment or genechosen for cellular introduction may also encode anti-sense RNA, i.e., acomplement of a predetermined RNA molecule, or a portion thereof, thatis expressed in an untransformed cell. The transcription of ananti-sense RNA suppresses the expression of the complementary RNA, e.g.,one that encodes an undesirable property. Thus, a preselected DNAsegment, in the form of vectors and plasmids, or linear DNA fragments,in some instances containing only the DNA element to be expressed in theeukaryote may be employed.

A replicating vector may also be useful for delivery of a target gene.Examples of useful vectors include pGL2-Basic Vector (fireflyluciferase) from Promega, pCAT3-Basic Vector (CATransferase) fromPromega. or a pSEAP2 Vector (secreted alkaline phosphatase) fromClontech.

DNA useful for introduction into cells includes that which has beenderived or isolated from any source, that may be subsequentlycharacterized as to structure, size and/or function. An example of suchDNA “isolated” from a source would be a useful DNA sequence that isexcised or removed from a source by chemical means, e.g., by the use ofrestriction endonucleases, so that it can be further manipulated, e.g.,separated or amplified, e.g., via polymerase chain reaction (PCR), foruse in the invention, by the methodology of genetic engineering.Recovery or isolation of a given fragment of DNA from a restrictiondigest can employ separation of the digest on polyacrylamide or agarosegel by electrophoresis, identification of the fragment of interest bycomparison of its mobility versus that of marker DNA fragments of knownmolecular weight, removal of the gel section containing the desiredfragment, and separation of the gel from DNA. See Lawn et al., NucleicAcids Res., 9:6103 (1981), and Goeddel et al., Nucleic Acids Res.,8:4057 (1980). Thus, DNA is “isolated” in that it is free from at leastone contaminating nucleic acid with which it is normally associated inthe natural source of the RNA or DNA and is preferably substantiallyfree of any other mammalian RNA or DNA. The phrase “free from at leastone contaminating source nucleic acid with which it is normallyassociated” includes the case where the nucleic acid is reintroducedinto the source or natural cell but is in a different chromosomallocation or is otherwise flanked by nucleic acid sequences not normallyfound in the source cell.

An example of DNA “derived” from a source, would be a DNA sequence orsegment that is identified as a useful fragment within a given organism,and that is then chemically synthesized in essentially pure form.Therefore, “recombinant or preselected DNA” includes completelysynthetic DNA sequences, semi-synthetic DNA sequences, DNA sequencesisolated from biological sources, and DNA sequences derived from RNA, aswell as mixtures thereof.

The introduced DNA includes, but is not limited to, DNA from genes suchas those from bacteria, yeasts, animals, plants or viruses. Moreover, itis within the scope of the invention to isolate a preselected DNAsegment from a given genotype, and to subsequently introduce multiplecopies of the preselected DNA segment into the same genotype, e.g., toenhance production of a given gene product. The introduced DNA caninclude modified genes, portions of genes, or chimeric genes, includinggenes from the same or different genotype. The term “chimeric gene” or“chimeric DNA” is defined as a gene or DNA sequence or segmentcomprising at least two DNA sequences or segments from species that donot combine DNA under natural conditions, or DNA sequences or segmentsthat are positioned or linked in a manner that does not normally occurin the native genome of the untransformed cell.

The introduced DNA used for transformation herein may be circular orlinear, double-stranded or single-stranded. Generally, the DNA is in theform of chimeric DNA, such as plasmid DNA, that can also contain codingregions flanked by regulatory sequences that promote the expression ofthe recombinant DNA present in the resultant transformed cell.

Generally, the introduced DNA will be relatively small, i.e., less thanabout 30 kb to minimize any susceptibility to physical, chemical, orenzymatic degradation that is known to increase as the size of the DNAincreases. As noted above, the number of proteins, RNA transcripts ormixtures thereof, encoded by the DNA molecules that are introduced intothe genome is preferably preselected and defined, e.g., from one toabout 5-10 such products of the introduced DNAs may be formed.

A. Preparation of an Expression Cassette

An expression cassette of the invention can comprise a recombinant DNAmolecule containing a preselected DNA segment operably linked to aCCTp208 or mCCTp240 promoter functional in a host cell. The expressioncassette itself may be chimeric, i.e., the cassette comprises DNA fromat least two different species, or comprises DNA from the same species,and is linked or associated in a manner that does not occur in the“native” or wild type of the species.

1. DNA Molecules of the Invention That Comprise a CCT Promoter of thePresent Invention

A promoter is a region of DNA that regulates gene expression. Promoterregions are typically found in the flanking DNA sequence upstream fromthe coding sequence in viruses as well as prokaryotic and eukaryoticcells. A promoter sequence provides for regulation of transcription ofthe downstream gene sequence and typically includes from about 50 toabout 2,000 nucleotide base pairs. Promoter sequences can also containregulatory sequences such as enhancer sequences that can influence thelevel of gene expression. Some isolated promoter sequences can providefor gene expression of heterologous DNAs, that is a DNA different fromthe native or homologous DNA. Promoter sequences are also known to bestrong or weak or inducible. A strong promoter provides for a high levelof gene expression, whereas a weak promoter provides for a very lowlevel of gene expression. An inducible promoter is a promoter thatprovides for turning on and off of gene expression in response to anexogenously added agent or to an environmental or developmentalstimulus. Promoters can also provide for tissue specific ordevelopmental regulation. An isolated promoter sequence that is a strongpromoter for heterologous DNAs is advantageous because it provides for asufficient level of gene expression to allow for easy detection andselection of transformed cells and provides for a high level of geneexpression when desired.

The DNA molecule of the invention comprises a preselected DNA segmentcomprising a CCTp208 (such as SEQ ID NO:1) or mCCTp240 promoter (such asSEQ ID NO:2) that is operably linked to a preselected DNA segment that aperson would like expressed. The desired preselected DNA segment can becombined with the CCT promoter by standard methods as described inSambrook et al., Molecular Cloning: A Laboratory Manual, Cold SpringHarbor (1989)). Briefly, the preselected DNA segment can be subcloneddownstream from the promoter using restriction enzymes to ensure thatthe DNA is inserted in proper orientation with respect to the promoterso that the DNA can be expressed. Once the preselected DNA segment isoperably linked to the promoter, the expression cassette so formed canbe subcloned into a plasmid or other vector.

2. Variants of the DNA Molecules of the Invention

Nucleic acid molecules encoding nucleotide sequence variants of aCCTp208 promoter (e.g., SEQ ID NO:1) or an mCCTp240 promoter (e.g. SEQID NO:2), can be prepared by a variety of methods known in the art.These methods include, but are not limited to, isolation from a naturalsource (in the case of naturally occurring nucleotide sequence variants)or preparation by oligonucleotide-mediated (or site-directed)mutagenesis, PCR mutagenesis, and cassette mutagenesis of an earlierprepared variant or a non-variant version of the CCT promoter of thepresent invention.

Oligonucleotide-mediated mutagenesis is a preferred method for preparingnucleotide substitution variants of the CCT promoter. This technique iswell known in the art as described by Adelman et al., DNA, 2:183 (1983).Briefly, CCT promoter DNA is altered by hybridizing an oligonucleotideencoding the desired mutation to a DNA template, where the template isthe single-stranded form of a plasmid or bacteriophage containing theunaltered or native DNA sequence of the CCT promoter. Afterhybridization, a DNA polymerase is used to synthesize an entire secondcomplementary strand of the template that will thus incorporate theoligonucleotide primer, and will code for the selected alteration in theCCT promoter.

Generally, oligonucleotides of at least about 20-25 nucleotides inlength are used. An optimal oligonucleotide will have 12 to 15nucleotides that are completely complementary to the template on eitherside of the nucleotide(s) coding for the mutation. This ensures that theoligonucleotide will hybridize properly to the single-stranded DNAtemplate molecule. The oligonucleotides are readily synthesized usingtechniques known in the art such as that described by Crea et al., Proc.Natl. Acad. Sci. U.S.A., 75:5765 (1978).

The DNA template can be generated by those vectors that are eitherderived from bacteriophage M13 vectors (the commercially available M13mp18 and M13 mp19 vectors are suitable), or those vectors that contain asingle-stranded phage origin of replication as described by Viera etal., Meth. Enzynol.; 153:3 (1987). Thus, the DNA that is to be mutatedmay be inserted into one of these vectors to generate single-strandedtemplate. Production of the single-stranded template is described inSections 4.21-4.41 of Sambrook et al., Molecular Cloning: A LaboratoryManual (Cold Spring Harbor Laboratory Press, N.Y. 1989).

Alternatively, single-stranded DNA template may be generated bydenaturing double-stranded plasmid (or other) DNA using standardtechniques.

For alteration of the native DNA sequence, the oligonucleotide ishybridized to the single-stranded template under suitable hybridizationconditions. A DNA polymerizing enzyme, usually the Klenow fragment ofDNA polymerase I, is then added to synthesize the complementary strandof the template using the oligonucleotide as a primer for synthesis. Aheteroduplex molecule is thus formed such that one strand of DNA encodesthe mutated form of the CCT promoter, and the other strand (the originaltemplate) encodes the native, unaltered sequence of the CCT promoter.This heteroduplex molecule is then transformed into a suitable hostcell, usually a prokaryote such as E. coli JM101. After the cells aregrown, they are plated onto agarose plates and screened using theoligonucleotide primer radiolabeled with ³²P to identify the bacterialcolonies that contain the mutated DNA. The mutated region is thenremoved and placed in an appropriate vector for protein production,generally an expression vector of the type typically employed fortransformation of an appropriate host.

The method described immediately above may be modified such that ahomoduplex molecule is created wherein both strands of the plasmidcontain the mutations(s). The modifications are as follows: Thesingle-stranded oligonucleotide is annealed to the single-strandedtemplate as described above.

A mixture of three deoxyribonucleotides, deoxyriboadenosine (dATP),deoxyriboguanosine (dGTP), and deoxyribothymidine (dTTP), is combinedwith a modified thiodeoxyribocytosine called dCTP-(aS) (AmershamCorporation). This mixture is added to the template-oligonucleotidecomplex. Upon addition of DNA polymerase to this mixture, a strand ofDNA identical to the template except for the mutated bases is generated.In addition, this new strand of DNA will contain dCTP-(aS) instead ofdCTP, which serves to protect it from restriction endonucleasedigestion.

After the template strand of the double-stranded heteroduplex is nickedwith an appropriate restriction enzyme, the template strand can bedigested with ExoIII nuclease or another appropriate nuclease past theregion that contains the site(s) to be mutagenized. The reaction is thenstopped to leave a molecule that is only partially single-stranded. Acomplete double-stranded DNA homoduplex is then formed using DNApolymerase in the presence of all four deoxyribonucleotidetriphosphates, ATP, and DNA ligase. This homoduplex molecule can then betransformed into a suitable host cell such as E. coli JM101.

Embodiments of the invention include an isolated and purified DNAmolecule comprising a preselected DNA segment comprising a CCTp208promoter comprising SEQ ID NO:1 or mCCTp240 promoter comprising SEQ IDNO:2, or nucleotide sequence variants of SEQ ID NO:1 or SEQ ID NO:2 thatretain the biological activity of the promoter.

3. Optional Sequences in the Expression Cassette

The expression cassette can also optionally contain other DNA sequences.In order to improve the ability to identify transformants, one maydesire to employ a selectable or screenable marker gene as, or inaddition to, the expressible preselected DNA segment. “Marker genes” aregenes that impart a distinct phenotype to cells expressing the markergene and thus allow such transformed cells to be distinguished fromcells that do not have the marker. Such genes may encode either aselectable or screenable marker, depending on whether the marker confersa trait that one can ‘select’ for by chemical means, i.e., through theuse of a selective agent (e.g., an antibiotic), or whether it is simplya trait that one can identify through observation or testing. Of course,many examples of suitable marker genes are known to the art and can beemployed in the practice of the invention.

Included within the terms selectable or screenable marker genes are alsogenes that encode a “secretable marker” whose secretion can be detectedas a means of identifying or selecting for transformed cells. Examplesinclude markers that encode a secretable antigen that can be identifiedby antibody interaction, or even secretable enzymes that can be detectedby their catalytic activity. Secretable proteins fall into a number ofclasses, including small, diffusible proteins detectable, e.g., byELISA; small active enzymes detectable in extracellular solution.

Elements of the present disclosure are exemplified in detail through theuse of particular marker genes, however in light of this disclosure,numerous other possible selectable and/or screenable marker genes willbe apparent to those of skill in the art in addition to the one setforth hereinbelow. Therefore, it will be understood that the followingdiscussion is exemplary rather than exhaustive. In light of thetechniques disclosed herein and the general recombinant techniques thatare known in the art, the present invention renders possible theintroduction of any gene, including marker genes, into a recipient cell.

Transcription enhancers or duplications of enhancers can be used toincrease expression from a particular promoter. Also, leader sequencesthat influence gene expression may be used. These are DNA sequencesinserted between the transcription initiation site and the start of thecoding sequence, i.e., the untranslated leader sequence. Preferredleader sequence include those that comprise sequences selected to directoptimum expression of the attached gene, i.e., to include a preferredconsensus leader sequence that can increase or maintain mRNA stabilityand prevent inappropriate initiation of translation (Joshi, Nucl. AcidRes., 15:6643 (1987)). Such sequences are known to those of skill in theart. However, some leader sequences have a high degree of secondarystructure that is expected to decrease mRNA stability and/or decreasetranslation of the mRNA. Thus, leader sequences that do not have a highdegree of secondary structure or that have a high degree of secondarystructure where the secondary structure does not inhibit mRNA stabilityand/or decrease translation will be most preferred. Other suchregulatory elements useful in the practice of the invention are known tothose of skill in the art.

Additionally, expression cassettes can be constructed and employed totarget the gene product of the preselected DNA segment to anintracellular compartment within cells or to direct a protein to theextracellular environment. This can generally be achieved by joining aDNA sequence encoding a transit or signal peptide sequence to the codingsequence of the preselected DNA segment. The resultant transit, orsignal, peptide will transport the protein to a particularintracellular, or extracellular destination, respectively, and can thenbe post-translationally removed. Transit or signal peptides act byfacilitating the transport of proteins through intracellular membranes,e.g., mitochondrial membranes, whereas signal peptides direct proteinsthrough the extracellular membrane. By facilitating transport of theprotein into compartments inside or outside the cell, these sequencescan increase the accumulation of gene product.

It may be useful to target DNA itself within a cell. For example, it maybe useful to target an introduced preselected DNA to the nucleus as thismay increase the frequency of transformation. Within the nucleus itself,it would be useful to target a gene in order to achieve site-specificintegration. For example, it would be useful to have a gene introducedthrough transformation replace an existing gene in the cell.

When the expression cassette is to be introduced into a cell, theexpression cassette can also optionally include 3′ nontranslatedregulatory DNA sequences that act as a signal to terminate transcriptionand allow for the polyadenylation of the resultant mRNA. The 3′nontranslated regulatory DNA sequence preferably includes from about 300to 1,000 nucleotide base pairs and contains transcriptional andtranslational termination sequences. These 3′ nontranslated regulatorysequences can be obtained as described in An, Methods in Enzymology,153:292 (1987) or are already present in plasmids available fromcommercial sources such as Clontech, Palo Alto, Calif. The 3′nontranslated regulatory sequences can be operably linked to the 3′terminus of the preselected DNA segment.

An expression cassette can also be introduced into an expression vector,such as a plasmid. Plasmid vectors include additional DNA sequences thatprovide for easy selection, amplification, and transformation of theexpression cassette in prokaryotic and eukaryotic cells, e.g.,pUC-derived vectors, pSK-derived vectors, pGEM-derived vectors,pSP-derived vectors, or pBS-derived vectors. Thus, additional DNAsequences include origins of replication to provide for autonomousreplication of the vector, selectable marker genes, preferably encodingantibiotic or herbicide resistance, unique multiple cloning sitesproviding for multiple sites to insert DNA sequences or genes encoded inthe expression cassette, and sequences that enhance transformation ofprokaryotic and eukaryotic cells.

III. DNA Delivery

The expression cassette or vector can be introduced into a recipientcell to create a transformed cell. A preselected DNA segment may bedelivered into cells or tissues, by currently available methodsincluding, but not limited to, infectious viruses, the use of liposomes,microinjection by mechanical or laser beam methods, by whole chromosomesor chromosome fragments, and electroporation.

The invention will be further described by the following examples.

EXAMPLE 1

Type II alveolar epithelial cells critically rely on lipids carriedwithin circulating lipoproteins to adequately synthesize the surfactantlipid, disaturated phosphatidylcholine (DSPtdCho). Studies have notinvestigated feedback control mechanisms by which these cells maintainsurfactant PtdCho homeostasis under conditions of lipid (sterol)deprivation.

Prior studies showed that fasting or lipid restriction in animal modelstransiently decreases surfactant levels and surface-activity, howeverlevels are restored by 96 hrs. Fasting increases choline incorporationinto DSPtdCho suggesting that PtdCho synthesis increases a compensatorymechanism in response to caloric restriction. CTP: phosphocholinecytidylyltransferase (CCT) is the rate-limiting enzyme involved inDSPtdCho synthesis, and CCT activity is stimulated by long-termlipoprotein deprivation. Also, published sequence information for themouse CCT gene (Bakovic et al., BBA, 1438:147-65, 1999) indicatesseveral potential binding sites (SP 1, SRE) that might confer sterolregulation.

The present studies were conducted primarily using a murine alveolartype II epithelial cell line (MLE-12). Cells were cultured in Hitesmedium with 10% fetal bovine serum (FBS) or 10% lipoprotein-deficientserum (LPDS) supplemented with lovastatin (2 μg/ml) or cyclodextrin (100μg/ml) for 72 hrs prior to analysis. Northern Blotting was performed asdescribed using CCTA specific probes (Mallampalli et al., JBC:275:9699-9708, 2000). Amplification of varying portions of the proximal5′ flanking region of the mouse CCT gene (CCTp298) was performed usingPCR using mouse genomic DNA as a template. A pCR 4-TOPO cloning vector(In Vitrogen) and the reporter vector, pGL3 Basic (Promega), were usedto generate CCT promoter-reporter constructs for transient transfectionsto assess reporter activity. Specifically, a proximal 5′ region of thegene was amplified by PCR using C57B1/6J mouse genomic DNA (100 ng) as atemplate. Primers (0.4 μM) 5′-CCCTCTGGAAGCGGAACTAC-3′ (left) (SEQ IDNO:3) and 5′-TCAACTCCTCCAGGCTCC-GGT-3′ (right) (SEQ ID NO:4) wereincubated in a reaction mixture using PCR Supermix (Life Technologies)containing 1.5 mM MgCl 200 μM dNTP and Taq DNA polymerase (1.0 U/per 50μl of reaction solution). PCR conditions included an initial cycle of94° C. for 2 min, followed by 35 cycles at 94° C. for 30 sec, 56° C. for30 sec, and 72° C. for 2 min plus final extension at 72° C. for 10 min.Amplification resulted in a 208 bp product that was then cloned into apCR4-TOPO cloning vector (InVitrogen). The product was digested withNot1 and Spe1, gel purified, and sequenced. It was confirmed to beidentical to the published sequence. This fragment was thendirectionally subcloned into a reporter construct (pGL3basic, Promega,Madison, Wis.) upstream of the firefly luciferase coding region.Expression was evaluated after transient transfection into a murine lungepithelial cell line (MLE) cultured in the presence of fetal bovineserum for 24 hrs.

Effect of LPDS on CCT mRNA in MLE cells (FIG. 1). Cells were grown inHite's medium with 10% FBS for 72 hrs before changing to fresh mediumcontaining 10% FBS or 10% LPDS for 2 to 48 h. As indicated in FIG. 1, atthe indicated times, total cellular RNA (50 μg) was isolated and theamount of (A) CCT mRNA or (B) 18S RNA was determined by Northernanalysis. Representative autoradiograms are shown. (C)CCT mRNA levels inFBS and LPDS exposed cells as determined by Northern analysis using 10μg poly(A)-rich RNA. (D) Densitometric analysis of CCT mRNA in cellscultured in medium containing 10% FBS or 10% LPDS for 2 to 48 hrs. TheCCT mRNA/18S ratios for FBS were arbitrarily assigned a value of 1. Thedensitometric data is from 3 independent experiments. Values are shownas mean±SEM.

The effect of LPDS on turnover of CCT mRNA in MLE cells (FIG. 2). Cellswere grown in Hite's medium containing 10% FBS for 72 hrs beforechanging to fresh medium with either 10% FBS or 10% LPDS containing 5μg/ml actinomycin D for 0 to 8 hrs. Northern analysis of total cellularRNA (50 μg) was performed to determine amounts of (A) CCT mRNA and (B)18S RNA. (C) Densitometric analysis of autoradiograms shows CCT mRNA/18SOD ratios for cells grown in medium containing 5 μg/ml actinomycin Dwith either 10% FBS or 10% LPDS. Values are expressed on a logarithmicscale and expressed as mean±SEM from 3 independent experiments.

The effect of LPDS on CCT gene transcription in MLE cells (FIG. 3).Cells were grown in Hite's medium containing 10% FBS for 72 hrs beforechanging to fresh medium with either 10% FBS or 10% LPDS for 6 or 48hrs. After culture, cells were harvested and processed for CCTtranscriptional activity as described using a nuclear run on assay (Am.J. Respir. Cell Mol. Biol., 20:751 (1999)). The data is representativeof two independent experiments.

Deletional Analysis of the CCT Promoter in MLE Cells (FIG. 4).Transfections were conducted in 0.5% Hites medium with Fugene 6 with0.75 μg of test plasmid for 90 min using six different luciferasevectors: 1) PGL3 Basic, a negative control, contains no promoter; 2)PGL3 Promoter (PGL3 Pro), a positive control, contains the SV40promoter, and 3) CCTp1938 (−1867/+71), CCTp1054 (−983/+71),CCTp240(−169/+71), and CCTp90 (−19/+71), the experimental vectorscontaining fragments of the CCT promoter cloned into PGL3 Basic.Immediately after transfections cells were exposed to medium containing10% FBS for 18 hr. Luciferase and β-galactosidase activities in cellularextracts were determined by luminometer readings after these values werenormalized for transfection efficiency by co-transfection with CMV-β-gal(0.25 μg). Data is from n=7 studies.

LPDS Activates the CCT Promoter in MLE Cells (FIG. 5). Transfectionswere conducted as in FIG. 4 using PGL3 Basic, a negative control andCCTp1938, CCTp1054, CCTp240, and CCTp90 as experimental vectors. Aftertransfections cells were exposed to medium containing 10% FBS or LPDSfor 18 hr. Luciferase and P-galactosidase activities in cellularextracts were determined by luminometer readings after these values werenormalized for transfection efficiency by co-transfection withCMV-P-gal. The data is representative of n=6 separate studies.

LPDS Activation of the CCT Promoter is Not Species Specific (FIG. 6).Transfections conditions were identical to those described in FIG. 5.Expression of the CCTp240 experimental vector in the murine (MLE-12)type 2 cell line was compared to human (A549, and H441), and animmortalized fetal rat (IFT2) cell line. After transfections cells wereexposed to medium containing FBS of LPDS and values normalized fortransfection efficiency as in FIG. 4. The data is from of n-3 separatestudies.

LPDS Activation of the CCT Promoter Requires Intact SRE (FIG. 7).Transfections conditions were identical to those described in FIG. 5.MLE cells were transfected with a wild-type or mutated CCTp240experimental vector where the SRE element was modified. The putativesterol regulatory element (SRE) within the 5′ flanking region (−156/−147bp relative to first transcriptional start site) of the CCT gene,GTCACCCCAC (SEQ ID NO:5), was mutated to GTAAACCCAC (SEQ ID NO:6) usingthe Quikchange Site-Directed Mutagenesis Kit (Stratagene, La Jolla,Calif.) and primers 5′-CCGCTCAGTAAACCCACGCGCCCGG-3′ (left) (SEQ ID NO:7)and 5′-CCGGGCGCGTGGGTTTACTGAGCCG-3′ (right) (SEQ ID NO:8). Aftertransfections cells were exposed to medium containing FBS or LPDS andvalues normalized for transfection efficiency as in FIG. 4. The data isfrom of n=3 separate studies.

Promoter Activity of CCT (FIG. 8). A murine lung epithelial (MLE) cellline (closed bars) or a hepatoma (HepG2) line (hatched bars) weretransfected with one of three different promoter constructs: 1) a 240 bppromoter fragment of the CCT gene (CCTp240), 2) a novel 208 bp promoterfragment of the CCT gene, or 3) an SV40 strong viral promoter.Transfections were conducted for 90 min in Fugene 6 with Hites 0.5%fetal bovine serum. After 18 hrs, cells were harvested and luciferaseand β galactosidase activities assayed. Luminometer readings werenormalized for transfection efficiency by co-transfection of CMV-βgalactosidase. Similar results were observed in two additional studies.

LPDS Activates the CT Promoter in MLE Cells (FIG. 9). Transfections wereconducted as in FIG. 6. After transfections, cells were exposed tomedium containing 10% FBS or LPDS for 18 hr. For comparison, the effectsof LPDS on activation of a positive control plasmid encoding the humanlow-density lipoprotein receptor (LDLR) coupled to luciferase weretested. The effects of LPDS on a CCT promoter luciferase constructharboring mutations within the candidate sterol-regulatory element(mCCTp240) were also tested. Luciferase and P-galactosidase activitiesin cellular extracts were determined in a luminometer. The data isrepresentative of n=6 separate studies (inset n=3). Thus, FIG. 9 showsthat the mCCTp240 promoter is considerably stronger than the commonlyused LDLR promoter.

All publications, patents and patent documents are incorporated byreference herein, as though individually incorporated by reference. Theinvention has been described with reference to various specific andpreferred embodiments and techniques. However, it should be understoodthat many variations and modifications may be made while remainingwithin the scope of the invention.

1. An isolated and purified DNA molecule comprising a preselected DNAsegment comprising a CTP:phosphocholine cytidylyltransferase (CCT) p208promoter or a mutant CCT p240 promoter.
 2. The DNA molecule of claim 1,wherein the preselected DNA segment comprises SEQ ID NO:1 or SEQ IDNO:2.
 3. The DNA molecule of claim 1, wherein the preselected DNAsegment exhibits at least about a 2-fold increase in expression above aPGL3Pro promoter.
 4. The DNA molecule of claim 1, wherein thepreselected DNA segment exhibits at least about a 10-fold increase inexpression above a PGL3Pro promoter.
 5. The DNA molecule of claim 1,wherein the preselected DNA segment exhibits at least about a 20-foldincrease in expression above a PGL3Pro promoter.
 6. The DNA molecule ofclaim 1, wherein the preselected DNA segment exhibits at least about a30-fold increase in expression above a PGL3Pro promoter.
 7. Anexpression cassette comprising a first preselected DNA segment thatcomprises the DNA molecule of claim 1 that is functional in a host celland is operably linked to a second preselected DNA segment encoding aprotein, RNA transcript, or a combination thereof.
 8. The expressioncassette of claim 7, which further comprises an enhancer element.
 9. Theexpression cassette of claim 7, wherein the second preselected DNAsegment comprises a selectable marker gene or a reporter gene.
 10. Theexpression cassette of claim 7, wherein the host cell is a eukaryoticcell.
 11. The expression cassette of claim 10, wherein the host cell isan animal cell.
 12. The expression cassette of claim 10 wherein the hostcell is a mammalian cell.
 13. The expression cassette of claim 10wherein the host cell is a human cell.
 14. The expression cassette ofclaim 10 wherein the host cell is a plant cell.
 15. A method forproducing transformed cells comprising the steps of (i) introducing intocells a recombinant DNA segment which comprises a first preselected DNAsegment comprising a CCTp208 promoter or a mCCTp240 promoter operablylinked to a second preselected DNA segment so as to yield transformedcells, and (ii) identifying or selecting a transformed cell line. 16.The method of claim 15, wherein the first preselected DNA segmentcomprises SEQ ID NO:1 or SEQ ID NO:2.
 17. The method of claim 15,wherein the recombinant DNA segment is expressed so as to impart aphenotypic characteristic to the transformed cells.
 18. The method ofclaim 15, wherein the second preselected DNA segment comprises aselectable marker gene or a reporter gene.
 19. The host cell made by themethod of claim
 15. 20. A host cell comprising the isolated and purifiedDNA molecule of claim
 1. 21. A host cell comprising the expressioncassette of claim 7.